-
Notifications
You must be signed in to change notification settings - Fork 295
feat(datafusion): Implement IcebergWriteExec for DataFusion write support #1585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CTTY for this pr, in generally look good! Just one minor nit.
} | ||
|
||
impl IcebergWriteExec { | ||
pub fn new(table: Table, input: Arc<dyn ExecutionPlan>, schema: ArrowSchemaRef) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point is that we should ensure that the input schema matches table's schema, otherwise we are doing schema evolution during write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Columns nullability and field type would be checked within execute_input_stream
when it's binding the Iceberg table schema to the input RecordBatch
. So we don't need to worry about it now.
This may prevent us from doing any forms of schema evolution, but I think that's a separate issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CTTY for this pr, LGTM!
Which issue does this PR close?
What changes are included in this PR?
IcebergWriteExec
to write the input execution plan to parquet files, and returns serialized data filesAre these changes tested?
added ut